In our initial discussion over choosing a topic for our project, we
narrowed it down to environmental-related data because we were
interested in seeing possible trends over time and the vast quantity of
environmental data that is available helped pique our curiosity.
Packages Required
#This will allow us to filter through our data
library(tidyverse)
library(dplyr)
#This will help us plot figures to showcase our findings
library(ggplot2)
#This will help us organize and display our data as necessary
library(knitr)
library(kableExtra)
#This expands our plot uses
library(plotly)
#Scientific Notation Disabled
options(scipen=T)
Deaths Data
We were excited to do our report over this data because it was
relatively tidy and had quite a few categorical variables and options
for additional columns to graph.
Our deaths due to air pollution data set was from Kaggle. The author
is Akshat Giri and was last updated 2 years ago so it’s pretty relevant.
When we first loaded in the data some of the column names were lengthy
so we shortened them to: country, acronym, year, total deaths, indoor
deaths, outdoor deaths, and ozone deaths.
Import the deaths-due-to-air-pollution data
deaths_df_old <- data.frame(read.csv("death-rates-from-air-pollution.csv"))
glimpse(deaths_df_old)
## Rows: 6,468
## Columns: 7
## $ Entity <chr> "Afghanistan", "Afghan…
## $ Code <chr> "AFG", "AFG", "AFG", "…
## $ Year <int> 1990, 1991, 1992, 1993…
## $ Air.pollution..total...deaths.per.100.000. <dbl> 299.4773, 291.2780, 27…
## $ Indoor.air.pollution..deaths.per.100.000. <dbl> 250.3629, 242.5751, 23…
## $ Outdoor.particulate.matter..deaths.per.100.000. <dbl> 46.44659, 46.03384, 44…
## $ Outdoor.ozone.pollution..deaths.per.100.000. <dbl> 5.616442, 5.603960, 5.…
We are going to rename a few of the columns and glimpse the data
deaths_df<- deaths_df_old %>% rename(country=Entity, acronym=Code, year=Year, total_deaths=Air.pollution..total...deaths.per.100.000., indoor_deaths=Indoor.air.pollution..deaths.per.100.000., outdoor_deaths=Outdoor.particulate.matter..deaths.per.100.000., ozone_deaths=Outdoor.ozone.pollution..deaths.per.100.000.)
glimpse(deaths_df)
## Rows: 6,468
## Columns: 7
## $ country <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanist…
## $ acronym <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG",…
## $ year <int> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1…
## $ total_deaths <dbl> 299.4773, 291.2780, 278.9631, 278.7908, 287.1629, 288.0…
## $ indoor_deaths <dbl> 250.3629, 242.5751, 232.0439, 231.6481, 238.8372, 239.9…
## $ outdoor_deaths <dbl> 46.44659, 46.03384, 44.24377, 44.44015, 45.59433, 45.36…
## $ ozone_deaths <dbl> 5.616442, 5.603960, 5.611822, 5.655266, 5.718922, 5.739…
Data Variables
Variables that interest us here include:
- country
- total_deaths: per 100,000
- indoor_deaths: Indoor air pollution is considered
pollution that occurs in the household. Cooking with solid fuels:
- Wood
- Crop waste, dung
- Charcoal, coal This method of cooking is commonly seen in
underdeveloped countries.
- outdoor_deaths: Outdoor air or ambient air are
emissions caused by combustion processes from motor vehicles, solid-fuel
burning and industries
- Ozone (O3)
- Particulate matter (PM10 and PM2.5)
- Nitrogen dioxide (NO2)
- Carbon monoxide (CO)
- Sulfur dioxide (SO2)
The data set takes a closer look at deaths caused by the ozone itself
which is considered a component of outdoor air pollution.
- ozone_deaths: Ozone is a gas that occurs both in
Earth’s upper atmosphere and at ground level. Ozone in the atmosphere is
an important and helpful greenhouse gas, but ground-level ozone is
created by extensive use of fossil fuels:
- Pollutants emitted by cars
- Power plants, industrial boilers, refineries, chemical plants
World Population Data
The world population data set is also from Kaggel. The author is
Devakumar K. P. and was last updated 2 years ago so it is also recent.
From looking at a glimpse of the data set you can see the columns are
country name, year, and count which refers to the population at that
time
Now, let’s take a look at the population data.
world_pop <- read.csv("population_total_long.csv")
glimpse(world_pop)
## Rows: 12,595
## Columns: 3
## $ Country.Name <chr> "Aruba", "Afghanistan", "Angola", "Albania", "Andorra", "…
## $ Year <int> 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 1960, 196…
## $ Count <int> 54211, 8996973, 5454933, 1608800, 13411, 92418, 20481779,…
To get a general idea of ‘deaths-dataframe’ we made, let’s make a
plots to see what’s happening. This is a plot of indoor x outdoor deaths
around the world by country.
This is a mess, and so we chose two countries from each continent (a
high-population and a low-population country) to graph.
We selected a high-population and a low-population country from each
continent, but we wanted a consistent variation between our selection of
low and high population countries. So, we came up with a formula for
calculating what the low-population country should be by multiplying the
high-population country by .10. For example, when we chose the U.S
(which had a population of 331002651 at the time the data was recorded),
we multiplied this number by .10 to get 33100265.1 and find the country
whose population most-closesly matched (in this case it was Canada with
37742154).
We purposefully left out countries whose population numbers were
higher than the majority because we didn’t want those countries to skew
the data (Russia, India, and China).
|
Country.Name
|
Year
|
Count
|
|
Australia
|
1996
|
18311000
|
|
Brazil
|
1996
|
164614688
|
|
Germany
|
1996
|
81914831
|
|
Nigeria
|
1996
|
110668794
|
|
Pakistan
|
1996
|
127349290
|
|
United States
|
1996
|
269394000
|
|
|
Country.Name
|
Year
|
Count
|
|
Canada
|
1996
|
29610218
|
|
Chile
|
1996
|
14587370
|
|
Sri Lanka
|
1996
|
18367288
|
|
Malawi
|
1996
|
10022789
|
|
New Zealand
|
1996
|
3732000
|
|
Serbia
|
1996
|
7617794
|
|
Continents:
- North America: U.S, Canada
- South America: Brazil, Chile
- Africa: Nigeria, Malawi
- Europe: Germany, Serbia
- Asia: Pakistan, Sri Lanka
- Oceania: Australia, New Zealand
Combine Data Sets
First let’s look at a table of the high and low populated countries
using the world population data set.
|
Country.Name
|
Year
|
Count
|
|
Australia
|
1996
|
18311000
|
|
Brazil
|
1996
|
164614688
|
|
Germany
|
1996
|
81914831
|
|
Nigeria
|
1996
|
110668794
|
|
Pakistan
|
1996
|
127349290
|
|
United States
|
1996
|
269394000
|
|
|
Country.Name
|
Year
|
Count
|
|
Canada
|
1996
|
29610218
|
|
Chile
|
1996
|
14587370
|
|
Sri Lanka
|
1996
|
18367288
|
|
Malawi
|
1996
|
10022789
|
|
New Zealand
|
1996
|
3732000
|
|
Serbia
|
1996
|
7617794
|
|
Next, we are going to see the death count for high and low populated
countries using the deaths dataframe.
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
|
Australia
|
AUS
|
1996
|
23.04465
|
0.3585034
|
22.407071
|
0.3249375
|
|
Australia
|
AUS
|
1997
|
22.43025
|
0.3222224
|
21.838737
|
0.3141838
|
|
Australia
|
AUS
|
1998
|
21.50529
|
0.2839769
|
20.960276
|
0.3048918
|
|
Australia
|
AUS
|
1999
|
20.40911
|
0.2590092
|
19.897091
|
0.2953354
|
|
Australia
|
AUS
|
2000
|
19.39822
|
0.2398763
|
18.909240
|
0.2899216
|
|
Australia
|
AUS
|
2001
|
18.58572
|
0.2234341
|
18.118700
|
0.2836469
|
|
Australia
|
AUS
|
2002
|
18.11849
|
0.2105980
|
17.662269
|
0.2859938
|
|
Australia
|
AUS
|
2003
|
17.23830
|
0.1937083
|
16.802536
|
0.2816949
|
|
Australia
|
AUS
|
2004
|
16.34770
|
0.1760229
|
15.932077
|
0.2785466
|
|
Australia
|
AUS
|
2005
|
15.41337
|
0.1599279
|
15.016089
|
0.2757150
|
|
Australia
|
AUS
|
2006
|
14.92239
|
0.1496469
|
14.530223
|
0.2819060
|
|
Australia
|
AUS
|
2007
|
14.92140
|
0.1449723
|
14.514884
|
0.3042005
|
|
Australia
|
AUS
|
2008
|
14.64683
|
0.1383225
|
14.228709
|
0.3254648
|
|
Australia
|
AUS
|
2009
|
14.11563
|
0.1259313
|
13.694572
|
0.3431982
|
|
Australia
|
AUS
|
2010
|
13.57171
|
0.1174834
|
13.140380
|
0.3647233
|
|
Australia
|
AUS
|
2011
|
13.72763
|
0.1119247
|
13.276676
|
0.3956796
|
|
Australia
|
AUS
|
2012
|
12.65973
|
0.1018626
|
12.196401
|
0.4192914
|
|
Australia
|
AUS
|
2013
|
11.87449
|
0.0973836
|
11.384154
|
0.4530427
|
|
Australia
|
AUS
|
2014
|
11.47268
|
0.0931036
|
10.939491
|
0.5037056
|
|
Australia
|
AUS
|
2015
|
11.27679
|
0.0886376
|
10.702072
|
0.5544068
|
|
Australia
|
AUS
|
2016
|
10.58644
|
0.0844017
|
9.974549
|
0.5955779
|
|
Australia
|
AUS
|
2017
|
10.79595
|
0.0833628
|
10.128111
|
0.6592419
|
|
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
|
Canada
|
CAN
|
1996
|
22.18101
|
0.0946226
|
20.155243
|
2.192488
|
|
Canada
|
CAN
|
1997
|
21.92768
|
0.0877542
|
19.908473
|
2.195940
|
|
Canada
|
CAN
|
1998
|
21.65538
|
0.0824492
|
19.634839
|
2.205681
|
|
Canada
|
CAN
|
1999
|
21.17703
|
0.0751278
|
19.179045
|
2.189426
|
|
Canada
|
CAN
|
2000
|
20.26486
|
0.0681836
|
18.326999
|
2.127733
|
|
Canada
|
CAN
|
2001
|
19.82451
|
0.0641108
|
17.938427
|
2.076464
|
|
Canada
|
CAN
|
2002
|
19.52428
|
0.0604824
|
17.669133
|
2.047603
|
|
Canada
|
CAN
|
2003
|
19.17033
|
0.0564743
|
17.338627
|
2.026864
|
|
Canada
|
CAN
|
2004
|
18.40919
|
0.0513588
|
16.629516
|
1.973025
|
|
Canada
|
CAN
|
2005
|
17.79268
|
0.0481667
|
16.030102
|
1.954712
|
|
Canada
|
CAN
|
2006
|
17.14391
|
0.0447622
|
15.445519
|
1.888735
|
|
Canada
|
CAN
|
2007
|
16.93196
|
0.0435468
|
15.229981
|
1.895259
|
|
Canada
|
CAN
|
2008
|
16.51814
|
0.0407468
|
14.829238
|
1.883242
|
|
Canada
|
CAN
|
2009
|
15.76760
|
0.0380831
|
14.118647
|
1.838920
|
|
Canada
|
CAN
|
2010
|
14.88338
|
0.0340653
|
13.281852
|
1.786430
|
|
Canada
|
CAN
|
2011
|
14.59934
|
0.0319160
|
13.030477
|
1.756998
|
|
Canada
|
CAN
|
2012
|
13.82968
|
0.0307105
|
12.243601
|
1.764727
|
|
Canada
|
CAN
|
2013
|
12.97501
|
0.0288027
|
11.410021
|
1.733997
|
|
Canada
|
CAN
|
2014
|
12.61872
|
0.0276959
|
11.032571
|
1.746991
|
|
Canada
|
CAN
|
2015
|
12.21793
|
0.0270578
|
10.609097
|
1.763895
|
|
Canada
|
CAN
|
2016
|
11.00267
|
0.0251286
|
9.397502
|
1.740834
|
|
Canada
|
CAN
|
2017
|
10.71662
|
0.0247705
|
9.110733
|
1.739718
|
|
Lastly, we will join the population and and deaths with its respected
country. The last column displays the ‘Count’ which represents the
population of that country at that time period.
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
Count
|
|
Australia
|
AUS
|
1996
|
23.04465
|
0.3585034
|
22.407071
|
0.3249375
|
18311000
|
|
Australia
|
AUS
|
1997
|
22.43025
|
0.3222224
|
21.838737
|
0.3141838
|
18517000
|
|
Australia
|
AUS
|
1998
|
21.50529
|
0.2839769
|
20.960276
|
0.3048918
|
18711000
|
|
Australia
|
AUS
|
1999
|
20.40911
|
0.2590092
|
19.897091
|
0.2953354
|
18926000
|
|
Australia
|
AUS
|
2000
|
19.39822
|
0.2398763
|
18.909240
|
0.2899216
|
19153000
|
|
Australia
|
AUS
|
2001
|
18.58572
|
0.2234341
|
18.118700
|
0.2836469
|
19413000
|
|
Australia
|
AUS
|
2002
|
18.11849
|
0.2105980
|
17.662269
|
0.2859938
|
19651400
|
|
Australia
|
AUS
|
2003
|
17.23830
|
0.1937083
|
16.802536
|
0.2816949
|
19895400
|
|
Australia
|
AUS
|
2004
|
16.34770
|
0.1760229
|
15.932077
|
0.2785466
|
20127400
|
|
Australia
|
AUS
|
2005
|
15.41337
|
0.1599279
|
15.016089
|
0.2757150
|
20394800
|
|
Australia
|
AUS
|
2006
|
14.92239
|
0.1496469
|
14.530223
|
0.2819060
|
20697900
|
|
Australia
|
AUS
|
2007
|
14.92140
|
0.1449723
|
14.514884
|
0.3042005
|
20827600
|
|
Australia
|
AUS
|
2008
|
14.64683
|
0.1383225
|
14.228709
|
0.3254648
|
21249200
|
|
Australia
|
AUS
|
2009
|
14.11563
|
0.1259313
|
13.694572
|
0.3431982
|
21691700
|
|
Australia
|
AUS
|
2010
|
13.57171
|
0.1174834
|
13.140380
|
0.3647233
|
22031750
|
|
Australia
|
AUS
|
2011
|
13.72763
|
0.1119247
|
13.276676
|
0.3956796
|
22340024
|
|
Australia
|
AUS
|
2012
|
12.65973
|
0.1018626
|
12.196401
|
0.4192914
|
22733465
|
|
Australia
|
AUS
|
2013
|
11.87449
|
0.0973836
|
11.384154
|
0.4530427
|
23128129
|
|
Australia
|
AUS
|
2014
|
11.47268
|
0.0931036
|
10.939491
|
0.5037056
|
23475686
|
|
Australia
|
AUS
|
2015
|
11.27679
|
0.0886376
|
10.702072
|
0.5544068
|
23815995
|
|
Australia
|
AUS
|
2016
|
10.58644
|
0.0844017
|
9.974549
|
0.5955779
|
24190907
|
|
Australia
|
AUS
|
2017
|
10.79595
|
0.0833628
|
10.128111
|
0.6592419
|
24601860
|
|
|
country
|
acronym
|
year
|
total_deaths
|
indoor_deaths
|
outdoor_deaths
|
ozone_deaths
|
Count
|
|
Canada
|
CAN
|
1996
|
22.18101
|
0.0946226
|
20.155243
|
2.192488
|
29610218
|
|
Canada
|
CAN
|
1997
|
21.92768
|
0.0877542
|
19.908473
|
2.195940
|
29905948
|
|
Canada
|
CAN
|
1998
|
21.65538
|
0.0824492
|
19.634839
|
2.205681
|
30155173
|
|
Canada
|
CAN
|
1999
|
21.17703
|
0.0751278
|
19.179045
|
2.189426
|
30401286
|
|
Canada
|
CAN
|
2000
|
20.26486
|
0.0681836
|
18.326999
|
2.127733
|
30685730
|
|
Canada
|
CAN
|
2001
|
19.82451
|
0.0641108
|
17.938427
|
2.076464
|
31020902
|
|
Canada
|
CAN
|
2002
|
19.52428
|
0.0604824
|
17.669133
|
2.047603
|
31360079
|
|
Canada
|
CAN
|
2003
|
19.17033
|
0.0564743
|
17.338627
|
2.026864
|
31644028
|
|
Canada
|
CAN
|
2004
|
18.40919
|
0.0513588
|
16.629516
|
1.973025
|
31940655
|
|
Canada
|
CAN
|
2005
|
17.79268
|
0.0481667
|
16.030102
|
1.954712
|
32243753
|
|
Canada
|
CAN
|
2006
|
17.14391
|
0.0447622
|
15.445519
|
1.888735
|
32571174
|
|
Canada
|
CAN
|
2007
|
16.93196
|
0.0435468
|
15.229981
|
1.895259
|
32889025
|
|
Canada
|
CAN
|
2008
|
16.51814
|
0.0407468
|
14.829238
|
1.883242
|
33247118
|
|
Canada
|
CAN
|
2009
|
15.76760
|
0.0380831
|
14.118647
|
1.838920
|
33628895
|
|
Canada
|
CAN
|
2010
|
14.88338
|
0.0340653
|
13.281852
|
1.786430
|
34004889
|
|
Canada
|
CAN
|
2011
|
14.59934
|
0.0319160
|
13.030477
|
1.756998
|
34339328
|
|
Canada
|
CAN
|
2012
|
13.82968
|
0.0307105
|
12.243601
|
1.764727
|
34714222
|
|
Canada
|
CAN
|
2013
|
12.97501
|
0.0288027
|
11.410021
|
1.733997
|
35082954
|
|
Canada
|
CAN
|
2014
|
12.61872
|
0.0276959
|
11.032571
|
1.746991
|
35437435
|
|
Canada
|
CAN
|
2015
|
12.21793
|
0.0270578
|
10.609097
|
1.763895
|
35702908
|
|
Canada
|
CAN
|
2016
|
11.00267
|
0.0251286
|
9.397502
|
1.740834
|
36109487
|
|
Canada
|
CAN
|
2017
|
10.71662
|
0.0247705
|
9.110733
|
1.739718
|
36540268
|
|
We also looked at how the data varied by continent.
joined_all <- right_join(deaths_df, world_pop, by=c('country' = 'Country.Name', 'year' = 'Year'))
head(joined_all)
## country acronym year total_deaths indoor_deaths outdoor_deaths
## 1 Afghanistan AFG 1990 299.4773 250.3629 46.44659
## 2 Afghanistan AFG 1991 291.2780 242.5751 46.03384
## 3 Afghanistan AFG 1992 278.9631 232.0439 44.24377
## 4 Afghanistan AFG 1993 278.7908 231.6481 44.44015
## 5 Afghanistan AFG 1994 287.1629 238.8372 45.59433
## 6 Afghanistan AFG 1995 288.0142 239.9066 45.36714
## ozone_deaths Count
## 1 5.616442 12412308
## 2 5.603960 13299017
## 3 5.611822 14485546
## 4 5.655266 15816603
## 5 5.718922 17075727
## 6 5.739174 18110657
North America
north_america <- joined_all %>% filter(country %in% c("United States", "Canada"))
head(na.omit(north_america))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Canada CAN 1990 23.74844 0.1461597 21.82110 2.024766
## 2 Canada CAN 1991 23.34036 0.1347912 21.40547 2.046623
## 3 Canada CAN 1992 23.00947 0.1247982 21.06392 2.069720
## 4 Canada CAN 1993 23.03293 0.1191081 21.03444 2.135114
## 5 Canada CAN 1994 22.60288 0.1107671 20.59547 2.152504
## 6 Canada CAN 1995 22.32566 0.1015955 20.28851 2.193303
## Count
## 1 27691138
## 2 28037420
## 3 28371264
## 4 28684764
## 5 29000663
## 6 29302311
South America
south_america <- joined_all %>% filter(country %in% c("Brazil", "Chile"))
head(na.omit(south_america))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Brazil BRA 1990 74.96820 44.08928 28.36460 3.330584
## 2 Brazil BRA 1991 71.52505 41.12989 27.91653 3.272506
## 3 Brazil BRA 1992 69.97594 39.07269 28.37737 3.321153
## 4 Brazil BRA 1993 69.34644 37.34668 29.37063 3.439490
## 5 Brazil BRA 1994 66.74580 34.60871 29.48986 3.445359
## 6 Brazil BRA 1995 63.54859 31.67095 29.22721 3.430127
## Count
## 1 149003223
## 2 151648011
## 3 154259380
## 4 156849078
## 5 159432716
## 6 162019896
Africa
africa <- joined_all %>% filter(country %in% c("Nigeria", "Malawi"))
head(na.omit(africa))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Malawi MWI 1990 167.7156 153.3657 12.60813 3.518561
## 2 Malawi MWI 1991 167.8769 153.3428 12.77371 3.541273
## 3 Malawi MWI 1992 171.1963 156.2008 13.19234 3.618770
## 4 Malawi MWI 1993 175.2565 159.9608 13.45895 3.686304
## 5 Malawi MWI 1994 180.9753 164.9773 14.10506 3.784780
## 6 Malawi MWI 1995 183.4036 166.9812 14.48956 3.847709
## Count
## 1 9404500
## 2 9600355
## 3 9685973
## 4 9710331
## 5 9745690
## 6 9844415
Europe
europe <- joined_all %>% filter(country %in% c("Germany", "Serbia"))
head(na.omit(europe))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Germany DEU 1990 41.91322 1.600590 38.11494 2.724651
## 2 Germany DEU 1991 40.73815 1.472532 37.08854 2.694316
## 3 Germany DEU 1992 38.94425 1.367432 35.45345 2.622836
## 4 Germany DEU 1993 38.25349 1.275528 34.85003 2.623219
## 5 Germany DEU 1994 36.85860 1.182584 33.58411 2.573705
## 6 Germany DEU 1995 35.66449 1.109101 32.47285 2.557293
## Count
## 1 79433029
## 2 80013896
## 3 80624598
## 4 81156363
## 5 81438348
## 6 81678051
Asia
asia <- joined_all %>% filter(country %in% c("Pakistan", "Sri Lanka"))
head(na.omit(asia))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Pakistan PAK 1990 144.7155 104.4196 34.80304 10.09603
## 2 Pakistan PAK 1991 148.0120 105.5436 36.80428 10.35961
## 3 Pakistan PAK 1992 148.6560 105.2133 37.76577 10.35540
## 4 Pakistan PAK 1993 149.6526 104.9854 38.95704 10.37194
## 5 Pakistan PAK 1994 151.1992 105.3557 40.06784 10.44016
## 6 Pakistan PAK 1995 154.9523 107.2959 41.72728 10.67907
## Count
## 1 107647921
## 2 110778648
## 3 113911126
## 4 117086685
## 5 120362762
## 6 123776839
Oceania
oceania <- joined_all %>% filter(country %in% c("Australia", "New Zealand"))
head(na.omit(oceania))
## country acronym year total_deaths indoor_deaths outdoor_deaths ozone_deaths
## 1 Australia AUS 1990 26.70503 0.6924006 25.72983 0.3285590
## 2 Australia AUS 1991 25.91503 0.6172074 25.02097 0.3222915
## 3 Australia AUS 1992 25.70745 0.5594191 24.86599 0.3286297
## 4 Australia AUS 1993 24.63559 0.4920491 23.86602 0.3232958
## 5 Australia AUS 1994 24.38185 0.4454673 23.65269 0.3300999
## 6 Australia AUS 1995 23.10038 0.3895721 22.43122 0.3244735
## Count
## 1 17065100
## 2 17284000
## 3 17495000
## 4 17667000
## 5 17855000
## 6 18072000
This is a closer view on the population growth over time in both the
high and low populated countries that we selected.
This graph shows the population change over time. Something to note
is Germany and Australia’s line seem relatively flat, but a closer look
will determine there is a gradual increase that is drastically slower
than the other countries.


These graphs are of the same information but we have added the
percentage of air-pollution related deaths as the width of the line to
demonstrate visually if deaths increased or decreased over time. It is
easier to see in the high-populated countries but air-pollution related
deaths do decrease over time.
Death Count
Which country has the highest average death count?
Let’s make a table depicting the high and low populated countries and
their respected death count due to pollution.
|
country
|
hp_average_death
|
|
Australia
|
17.76815
|
|
Brazil
|
48.42928
|
|
Germany
|
28.10988
|
|
Nigeria
|
112.30157
|
|
Pakistan
|
144.33463
|
|
United States
|
26.35827
|
|
|
country
|
lp_average_death
|
|
Canada
|
18.18542
|
|
Chile
|
36.51321
|
|
Malawi
|
147.77167
|
|
New Zealand
|
15.92536
|
|
Serbia
|
80.66558
|
|
Sri Lanka
|
69.60383
|
|
We wanted to take a closer look at the death count and see which
country has the highest average death count. In the tables we made you
can see that Pakistan had the highest average death count at 144.33 for
the high populated countries. Malawi had the highest average death count
at 147.77 for the low populated countries, which is higher than
Pakistan.
Let’s see how this is different from continent to continent
#Mean total deaths for each continent
deaths_north <- na.omit(north_america) %>%
group_by(country) %>%
summarize(north_america_deaths = mean(total_deaths))
deaths_south <- na.omit(south_america) %>%
group_by(country) %>%
summarize(south_america_deaths = mean(total_deaths))
deaths_africa <- na.omit(africa) %>%
group_by(country) %>%
summarize(africa_deaths = mean(total_deaths))
deaths_europe <- na.omit(europe) %>%
group_by(country) %>%
summarize(europe_deaths = mean(total_deaths))
deaths_asia <- na.omit(asia) %>%
group_by(country) %>%
summarize(asia_deaths = mean(total_deaths))
deaths_oceania <- na.omit(oceania) %>%
group_by(country) %>%
summarize(oceania_deaths = mean(total_deaths))
#Table to view continent deaths
kable(deaths_north, caption = "North America Average Death Count")
North America Average Death Count
|
country
|
north_america_deaths
|
|
Canada
|
18.18542
|
|
United States
|
26.35827
|
kable(deaths_south, caption = "South America Average Death Count")
South America Average Death Count
|
country
|
south_america_deaths
|
|
Brazil
|
48.42928
|
|
Chile
|
36.51321
|
kable(deaths_africa, caption = "Africa Average Death Count")
Africa Average Death Count
|
country
|
africa_deaths
|
|
Malawi
|
147.7717
|
|
Nigeria
|
112.3016
|
kable(deaths_asia, caption = "Asia Average Death Count")
Asia Average Death Count
|
country
|
asia_deaths
|
|
Pakistan
|
144.33463
|
|
Sri Lanka
|
69.60383
|
kable(deaths_europe, caption = "Europe Average Death Count")
Europe Average Death Count
|
country
|
europe_deaths
|
|
Germany
|
28.10988
|
|
Serbia
|
80.66558
|
kable(deaths_oceania, caption = "Oceania Average Death Count")
Oceania Average Death Count
|
country
|
oceania_deaths
|
|
Australia
|
17.76815
|
|
New Zealand
|
15.92536
|
When we look at the average death count based on continent we can see
that overall Oceania countries had the least amount of deaths. On
average Australia had an average death count of roughly 17.8 and New
Zealand had an average death count of 15.9. Whereas Africa countries had
the most amount of deaths. On average Malawi had an average of 147.8 and
Nigeria had an average of 112.3.
Here’s a graph to clearly visualize the previous table
To get a better visualization we created a bar graph of the average
deaths in both the high and low populated countries. In this
high-population graph you can see that Pakistan is at the highest and
Australia is at the lowest. In the low-population graph you can see that
Malawi is at the highest and New Zealand is at the lowest.
So we’ve looked at the deaths due to pollution, but what percentage
of the population was affected?
In order to get rid of the leading zeros, and clean up the y-axis, we
multiplied the ‘percent_high’ and ‘percent_low’ by 100,000 since the
data was per 100,000 when calculating deaths.
|
Country.Name
|
average_population
|
|
Australia
|
21085646
|
|
Brazil
|
188017856
|
|
Germany
|
81914553
|
|
Nigeria
|
146828087
|
|
Pakistan
|
166653684
|
|
United States
|
299036073
|
|
|
Country.Name
|
average_population
|
|
Canada
|
32874340
|
|
Chile
|
16466330
|
|
Malawi
|
13442531
|
|
New Zealand
|
4193041
|
|
Serbia
|
7358242
|
|
Sri Lanka
|
19758408
|
|
So now that we’ve looked at the deaths due to pollution we wanted to
see what percentage of the population was actually affected by this. At
the top we have a table depicting the average populations in both the
high and low populated countries. You can see for the high populated
countries that Pakistan is in the lead with 12.1% and for the low
populated countries Malawi is in the lead with 130.9%.
Pollution Types
Which type of pollution has the greatest number of deaths?
Looking between indoor, outdoor, and ozone pollution deaths, we can
see which pollutant-type had the greatest death count.
|
country
|
avg_indoor
|
avg_outdoor
|
avg_ozone
|
|
Pakistan
|
87.7427944
|
50.52063
|
10.440656
|
|
Nigeria
|
75.8755074
|
35.21678
|
2.117076
|
|
Brazil
|
19.4258385
|
26.84194
|
2.740342
|
|
Germany
|
0.7170881
|
25.47078
|
2.343892
|
|
Australia
|
0.2485867
|
17.20789
|
0.360452
|
|
United States
|
0.1656402
|
22.79947
|
3.915093
|
The average indoor-death count is higher than other pollutant types,
especially in Pakistan and Nigeria. In all three categories, Pakistan
had the highest death count.
#Low Population Pollutant Averages
low_poll <- deaths_df %>%
group_by(country) %>%
filter(country %in% c('Canada', 'Chile', 'Malawi', 'Serbia', 'Sri Lanka', 'New Zealand')) %>%
select(country, indoor_deaths, outdoor_deaths, ozone_deaths) %>%
summarize(avg_indoor = mean(indoor_deaths), avg_outdoor = mean(outdoor_deaths), avg_ozone = mean(ozone_deaths))
kable(low_poll)
|
country
|
avg_indoor
|
avg_outdoor
|
avg_ozone
|
|
Canada
|
0.0651156
|
16.38423
|
1.9697041
|
|
Chile
|
8.6932699
|
27.17442
|
0.8504919
|
|
Malawi
|
132.1891749
|
13.81151
|
3.3870514
|
|
New Zealand
|
0.2908622
|
15.56872
|
0.0727512
|
|
Serbia
|
35.8762796
|
42.71254
|
2.9395671
|
|
Sri Lanka
|
44.5428441
|
24.77233
|
0.4304406
|
In the low populated countries, Malawi has a higher average indoor
death than that of any of our other selected countries. It is also the
highest in all three categories, as well.
In these graphs, it is much easier to see the discrepancies in the
indoor deaths of the countries. Malawi, Pakistan, and Nigeria have dots
located high on the graph indicating high death counts.
# High Outdoor Air Pollution
h_outdoor <- ggplot(high_poll, aes(x=country, y = avg_outdoor, color = avg_ozone, cex=5)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "violet", aesthetics = "colour") +
geom_point() +
labs(title = "Outdoor Air Pollution Deaths in High Population Countries") +
xlab("Country") +
ylab("Average Deaths")
ggplotly(h_outdoor)
#Low Outdoor Air Pollution
l_outdoor <- ggplot(low_poll, aes(x = country, y= avg_outdoor, color = avg_ozone, cex=5)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "violet", aesthetics = "colour") +
geom_point() +
labs(title = "Outdoor Air Pollution Deaths in Low Population Countries")+
xlab("Country")+
ylab("Average Deaths")
ggplotly(l_outdoor)
In these graphs, the outdoor pollution deaths are displayed on the
y-axis with the amount of ozone-related deaths indicated by the color
gradient. The dots highest-up on the graph have high outdoor-related
deaths, and the darker the color of pink indicates a high ozone-related
death.
For our high-populated countries, Pakistan had high outdoor-deaths
and ozone-deaths while the other high-populated countries were lower in
ozone-deaths. In the low-populated countries, Serbia had the highest
outdoor and ozone deaths, but Malawi in particular had the greatest
ozone deaths.
Pollution Over Time
Let’s look at the previous two decades and compare the death
count
has there been a change?
To see if there’s been a change over time we looked at 2 decades. The
first decade was from 1996-2006. Here we can see that there’s a general
decrease in both high and low populated countries over the years even
though some countries have a higher death count than others, such as
Nigeria, Pakistan, Malawi, and Tonga.
This is the first decade 1996-2006
|
country
|
High_Deaths_96
|
High_Deaths_01
|
High_Deaths_06
|
|
Australia
|
23.04465
|
18.58572
|
14.92239
|
|
Brazil
|
60.67757
|
49.46436
|
41.46829
|
|
Germany
|
34.72325
|
28.38756
|
23.83654
|
|
Nigeria
|
136.08978
|
123.05129
|
102.26653
|
|
Pakistan
|
155.42988
|
151.25352
|
146.09296
|
|
United States
|
29.99271
|
28.93114
|
25.93369
|
|
|
country
|
Low_Deaths_96
|
Low_Deaths_01
|
Low_Deaths_06
|
|
Canada
|
22.18101
|
19.82451
|
17.14391
|
|
Chile
|
46.36829
|
37.43188
|
30.99058
|
|
Malawi
|
183.14179
|
165.41702
|
137.54033
|
|
Serbia
|
93.44700
|
83.18333
|
79.04236
|
|
Sri Lanka
|
85.28997
|
72.16239
|
66.04455
|
|
Tonga
|
100.66078
|
95.27073
|
88.65608
|
|
The second decade was from 2007-2017 and there is still a decrease in
both high and low populated countries over the years. You can also see
that the death counts in the second decade are lower than the death
counts in the first decade. If you look at Pakistan you can see the
death count started with 155.4 in 1996 and went down to 143.8 in
2007.
This is the second decade 2007-2017
|
country
|
High_Deaths_07
|
High_Deaths_12
|
High_Deaths_17
|
|
Australia
|
14.92140
|
12.65973
|
10.79595
|
|
Brazil
|
40.42460
|
35.39069
|
30.32108
|
|
Germany
|
23.45850
|
20.91536
|
19.82826
|
|
Nigeria
|
98.90306
|
84.22324
|
81.22147
|
|
Pakistan
|
143.81724
|
133.93887
|
123.21548
|
|
United States
|
25.11756
|
21.98194
|
18.82515
|
|
|
country
|
Low_Deaths_07
|
Low_Deaths_12
|
Low_Deaths_17
|
|
Canada
|
16.93196
|
13.82968
|
10.71662
|
|
Chile
|
30.53130
|
27.31475
|
24.29921
|
|
Malawi
|
132.12253
|
116.27470
|
104.93508
|
|
Serbia
|
76.65752
|
72.77354
|
62.57853
|
|
Sri Lanka
|
66.05987
|
59.22433
|
38.46264
|
|
Tonga
|
87.81178
|
79.49336
|
70.72940
|
|
Let’s see if there is variation by continent. Here are some tables
for the first decade (1996-2006) and second decade (2007-2017) grouped
by continent.
#North America 1996-2006
north_96 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
north_01 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
north_06 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(north_96,north_01,north_06), caption = "North America Deaths 1996-2006")
North America Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Canada
|
22.18101
|
|
United States
|
29.99271
|
|
|
country
|
avg_deaths_01
|
|
Canada
|
19.82451
|
|
United States
|
28.93114
|
|
|
country
|
avg_deaths_06
|
|
Canada
|
17.14391
|
|
United States
|
25.93369
|
|
# North America 2007-2017
north_07 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
north_12 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
north_17 <- na.omit(north_america) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(north_07,north_12,north_17), caption = "North America Deaths 2007-2017")
North America Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Canada
|
16.93196
|
|
United States
|
25.11756
|
|
|
country
|
avg_deaths_12
|
|
Canada
|
13.82968
|
|
United States
|
21.98194
|
|
|
country
|
avg_deaths_17
|
|
Canada
|
10.71662
|
|
United States
|
18.82515
|
|
#South America 1996-2006
south_96 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
south_01 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
south_06 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(south_96,south_01,south_06), caption = "South America Deaths 1996-2006")
South America Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Brazil
|
60.67757
|
|
Chile
|
46.36829
|
|
|
country
|
avg_deaths_01
|
|
Brazil
|
49.46436
|
|
Chile
|
37.43188
|
|
|
country
|
avg_deaths_06
|
|
Brazil
|
41.46829
|
|
Chile
|
30.99058
|
|
# South America 2007-2017
south_07 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
south_12 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
south_17 <- na.omit(south_america) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(south_07,south_12,south_17), caption = "South America Deaths 2007-2017")
South America Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Brazil
|
40.4246
|
|
Chile
|
30.5313
|
|
|
country
|
avg_deaths_12
|
|
Brazil
|
35.39069
|
|
Chile
|
27.31475
|
|
|
country
|
avg_deaths_17
|
|
Brazil
|
30.32108
|
|
Chile
|
24.29921
|
|
# Africa 1996-2006
africa_96 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
africa_01 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
africa_06 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(africa_96,africa_01,africa_06), caption = "Africa Deaths 1996-2006")
Africa Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Malawi
|
183.1418
|
|
Nigeria
|
136.0898
|
|
|
country
|
avg_deaths_01
|
|
Malawi
|
165.4170
|
|
Nigeria
|
123.0513
|
|
|
country
|
avg_deaths_06
|
|
Malawi
|
137.5403
|
|
Nigeria
|
102.2665
|
|
# Africa 2007-2017
africa_07 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
africa_12 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
africa_17 <- na.omit(africa) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(africa_07,africa_12,africa_17), caption = "Africa Deaths 2007-2017")
Africa Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Malawi
|
132.12253
|
|
Nigeria
|
98.90306
|
|
|
country
|
avg_deaths_12
|
|
Malawi
|
116.27470
|
|
Nigeria
|
84.22324
|
|
|
country
|
avg_deaths_17
|
|
Malawi
|
104.93508
|
|
Nigeria
|
81.22147
|
|
#Europe 1996-2006
europe_96 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
europe_01 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
europe_06 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(europe_96,europe_01,europe_06), caption = "Europe Deaths 1996-2006")
Europe Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Germany
|
34.72325
|
|
Serbia
|
93.44700
|
|
|
country
|
avg_deaths_01
|
|
Germany
|
28.38756
|
|
Serbia
|
83.18333
|
|
|
country
|
avg_deaths_06
|
|
Germany
|
23.83654
|
|
Serbia
|
79.04236
|
|
#Europe 2007-2017
europe_07 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
europe_12 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
europe_17 <- na.omit(europe) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(europe_07,europe_12,europe_17), caption = "Europe Deaths 2007-2017")
Europe Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Germany
|
23.45850
|
|
Serbia
|
76.65752
|
|
|
country
|
avg_deaths_12
|
|
Germany
|
20.91536
|
|
Serbia
|
72.77354
|
|
|
country
|
avg_deaths_17
|
|
Germany
|
19.82826
|
|
Serbia
|
62.57853
|
|
#Asia 1996-2006
asia_96 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
asia_01 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
asia_06 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(asia_96,asia_01,asia_06), caption = "Asia Deaths 1996-2006")
Asia Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Pakistan
|
155.42988
|
|
Sri Lanka
|
85.28997
|
|
|
country
|
avg_deaths_01
|
|
Pakistan
|
151.25352
|
|
Sri Lanka
|
72.16239
|
|
|
country
|
avg_deaths_06
|
|
Pakistan
|
146.09296
|
|
Sri Lanka
|
66.04455
|
|
#Asia 2007-2017
asia_07 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
asia_12 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
asia_17 <- na.omit(asia) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(asia_07,asia_12,asia_17), caption = "Asia Deaths 2007-2017")
Asia Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Pakistan
|
143.81724
|
|
Sri Lanka
|
66.05987
|
|
|
country
|
avg_deaths_12
|
|
Pakistan
|
133.93887
|
|
Sri Lanka
|
59.22433
|
|
|
country
|
avg_deaths_17
|
|
Pakistan
|
123.21548
|
|
Sri Lanka
|
38.46264
|
|
#Oceania 1996-2006
oceania_96 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 1996) %>%
summarize(avg_deaths_96 = mean(total_deaths))
oceania_01 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2001) %>%
summarize(avg_deaths_01 = mean(total_deaths))
oceania_06 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2006) %>%
summarize(avg_deaths_06 = mean(total_deaths))
kable(list(oceania_96,oceania_01,oceania_06), caption = "Oceania Deaths 1996-2006")
Oceania Deaths 1996-2006
|
country
|
avg_deaths_96
|
|
Australia
|
23.04465
|
|
New Zealand
|
21.15988
|
|
|
country
|
avg_deaths_01
|
|
Australia
|
18.58572
|
|
New Zealand
|
16.91014
|
|
|
country
|
avg_deaths_06
|
|
Australia
|
14.92239
|
|
New Zealand
|
13.76706
|
|
#Oceania 2007-2017
oceania_07 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2007) %>%
summarize(avg_deaths_07 = mean(total_deaths))
oceania_12 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2012) %>%
summarize(avg_deaths_12 = mean(total_deaths))
oceania_17 <- na.omit(oceania) %>%
group_by(country) %>%
filter(year == 2017) %>%
summarize(avg_deaths_17 = mean(total_deaths))
kable(list(oceania_07,oceania_12,oceania_17), caption = "Oceania Deaths 2007-2017")
Oceania Deaths 2007-2017
|
country
|
avg_deaths_07
|
|
Australia
|
14.92140
|
|
New Zealand
|
13.58658
|
|
|
country
|
avg_deaths_12
|
|
Australia
|
12.65973
|
|
New Zealand
|
10.91224
|
|
|
country
|
avg_deaths_17
|
|
Australia
|
10.795952
|
|
New Zealand
|
8.598757
|
|
Based on these tables, we can see that Oceania countries had the
lowest death counts over time across both decades. In Europe, although
Germany is a high-populated country its death count was significantly
less than Serbia, a low-populated country. This was similarly seen in
Africa, where Malawi, a low-populated country, had a higher death count
overall in comparison to Nigeria, a high-populated country. In Africa,
there was the greatest amount of decrease in their average death count.
Malawi decreased roughly 4,600,000 from 1996 to 2006.
Let’s graph the previous tables!
In the first graph we faceted the high-populated countries and look
at the average death count in 1996 on the x-axis, the average death
count in 2001 on the y-axis, and color the points by the average death
count in 2006. The darker the points are the higher the death count was
in 2006. To get a closer look at the average death count in all 3 years
we created individualized bar graphs for 1996, 2001, and 2006.
High Population first decade (1996-2006).
In the first graph we faceted the low-populated countries and look at
the average death count in 1996 on the x-axis, the average death count
in 2001 on the y-axis, and color the points by the average death count
in 2006. The darker the points are the higher the death count was in
2006. To get a closer look at the average death count in all 3 years we
created individualized bar graphs for 1996, 2001, and 2006.
Low Population first decade (1996-2006).
#Low Population Deaths 1996-2006
p_low_first <- all_low_first %>%
group_by(country) %>%
ggplot(aes(x= Low_Deaths_96, y= Low_Deaths_01, color= Low_Deaths_06)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
geom_point() +
facet_wrap(~country)
interact_low_first<- p_low_first + labs(title="Low Populated Countries", x="1996 Deaths", y="2001 Deaths")
ggplotly(interact_low_first)
#Low Population Deaths 1996
p_low_first <- all_low_first %>%
group_by(country) %>%
ggplot(aes(x= country, y= Low_Deaths_96, fill = country)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
xlab("Country")+
ylab("1996 Deaths")+
geom_col(position = 'dodge')
interact_low_first<- p_low_first + labs(title="Low Populated Countries")
ggplotly(interact_low_first)
#Low Population Deaths 2001
p_low_first <- all_low_first %>%
group_by(country) %>%
ggplot(aes(x= country, y= Low_Deaths_01, fill = country)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
xlab("Country")+
ylab("2001 Deaths")+
geom_col(position = 'dodge')
interact_low_first<- p_low_first + labs(title="Low Populated Countries")
ggplotly(interact_low_first)
#Low Population Deaths 2006
p_low_first <- all_low_first %>%
group_by(country) %>%
ggplot(aes(x= country, y= Low_Deaths_06, fill = country)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
xlab("Country")+
ylab("2006 Deaths")+
geom_col(position = 'dodge')
interact_low_first<- p_low_first + labs(title="Low Populated Countries")
ggplotly(interact_low_first)
In the first graph we faceted the high-populated countries and look
at the average death count in 2007 on the x-axis, the average death
count in 2012 on the y-axis, and color the points by the average death
count in 2017. The darker the points are the higher the death count was
in 2017. To get a closer look at the average death count in all 3 years
we created individualized bar graphs for 2007, 2012, and 2017.
High Population second decade (2007-2017).
In the first graph we faceted the low-populated countries and look at
the average death count in 2007 on the x-axis, the average death count
in 2012 on the y-axis, and color the points by the average death count
in 2017. The darker the points are the higher the death count was in
2017. To get a closer look at the average death count in all 3 years we
created individualized bar graphs for 2007, 2012, and 2017.
Low Population second decade (2007-2017)
#Low Population Deaths 2007-2017
p_low_second <- all_low_second %>%
group_by(country) %>%
ggplot(aes(x= Low_Deaths_07, y= Low_Deaths_12, color= Low_Deaths_17)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
geom_point() +
facet_wrap(~country)
interact_low_second<- p_low_second + labs(title="Low Populated Countries", x="2007 Deaths", y="2012 Deaths")
ggplotly(interact_low_second)
#Low Population Deaths 2007
p_low_second_07 <- all_low_second %>%
group_by(country) %>%
ggplot(aes(x= country, y= Low_Deaths_07, fill=country)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
xlab("Country")+
ylab("2007 Deaths")+
geom_col(position='dodge')
interact_low_second_07<- p_low_second_07 + labs(title="Low Populated Countries")
ggplotly(interact_low_second_07)
#Low Population Deaths 2012
p_low_second_12 <- all_low_second %>%
group_by(country) %>%
ggplot(aes(x= country, y= Low_Deaths_12, fill=country)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
xlab("Country")+
ylab("2012 Deaths")+
geom_col(position='dodge')
interact_low_second_12<- p_low_second_12 + labs(title="Low Populated Countries")
ggplotly(interact_low_second_12)
#Low Population Deaths 2017
p_low_second_17 <- all_low_second %>%
group_by(country) %>%
ggplot(aes(x= country, y= Low_Deaths_17, fill=country)) +
scale_color_gradient2(low = "light pink", mid = "pink", high = "dark violet", aesthetics = "color") +
xlab("Country")+
ylab("2017 Deaths")+
geom_col(position='dodge')
interact_low_second_17<- p_low_second_17 + labs(title="Low Populated Countries")
ggplotly(interact_low_second_17)
By comparing each pollutant type, we can determine which year and
country had the highest numbers of deaths
Let’s focus on Indoor Deaths first:
It is interesting to see the decrease in both the high and
low-populated countries over time. It is easier to see the difference in
Malawi’s indoor deaths. Pakistan and Malawi were the two countries with
the highest indoor death count in our subset. Despite being a
low-populated country, Malwai had much larger death counts.
Outdoor Deaths:
For outdoor deaths you can see that for high populated countries
Pakistan is in the lead. There tends to be an interesting increase in
Nigeria from 2014-2015. Also Germany and the United States are very neck
in neck over the years. In the lower populated countries you can see
that Serbia is in the lead. Sri Lanka does have a steep decline from
2015 to 2016 but before that it was really close with Tonga. When you
look at the top 2 countries, Pakistan and Serbia, you can see that
Pakistan is drastically higher. 2011 was the greatest number of deaths
for Pakistan. 1997 was the greatest number of deaths for Serbia
Ozone Deaths
Which is worse?
outdoor or indoor pollution?
Let’s reintroduce a graph we looked at earlier. Instead this time we
will combine the pollutant types together.
In the high populated countries we can see that 2 countries exhibit
higher indoor air pollutant deaths whereas the others depict higher
outdoor deaths. In the lower populated countries you can see that half
show higher outdoor air pollutant deaths and half show higher indoor air
pollutant deaths. Therefore, we cannot conclude which is worse.
Summary
- Which country has the highest average death count?
- High Population: Pakistan
- Low Population: Malawi
- Has the percentage of the affected population decreased or increased
over time?
- Generally it is decreasing for both High and Low populated
countries
- Which pollutant type has the greatest number of deaths?
- High Population: Indoor Pollution
- Low Population: Indoor Pollution
- How has the death count changed over the past two decades?
- 1996-2006:
- High Population: Decreases
- Low Population: Decreases
- 2007-2017:
- High Population: Decreases
- Low Population: Decreases
- Which year and country had the highest number of deaths per
pollutant type?
- We looked at years 1996-2017
- Indoor: Pakistan and Malawi were mainly affected in 1996
- Outdoor: Serbia and Pakistan were the top countries.
- 2011 was the worst for Pakistan.
- 1997 was the worst for Serbia
- Sri Lanka and Tonga increased, but Sri Lanka had a steep decrease
after 2015
- Outdoor Ozone: Pakistan and Malawi were the top countries.
- 1997 was the worst for Pakistan.
- 1998 was the worst for Malawi
- United States was the second highest amount of deaths among the
higher populated countries
- Pakistan decreased and then slightly increased
- Which is worse - outdoor or indoor pollution?
- Based on our sample set, it is inconclusive
- “According to the EPA (Environmental Protection Agency), however,
the levels of indoor air pollutants are often 2 to 5 times higher than
outdoor levels, and in some cases these levels can exceed 100 times that
of outdoor levels of the same pollutants. In other words, sometimes the
air inside can be more harmful than the air outside.”
There are so many factors affecting environmental data, another area
of interest would be to look into the cultural, geographical, and
historical climate of the country to see if any significant events
occurred to skew the numbers during those years where great increases
and decreases occurred.
To make the data more manageable we had to reduce the number of
countries we looked at. This is why we selected one high populated
country and one low populated country from each continent. We thought
having a high and low populated country would give us a general idea on
how air pollution affected each continent. Furthermore, we excluded
Russia, China, and India because their populations were significantly
large and wouldn’t be a good representation of the average
populations.
Overall this project has sparked interest in environmental concerns.
After seeing the impact we have in our environment’s air pollution we
are prone to be more conscious of how we treat our planet. We hope that
this data has brought more awareness to others as well!